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The literature on hidden variables in quantum mechanics is now enor- 
mous, and it may seem there is little that is new that can be said. Not 
everything in the present article is new, but several things are. We have 
tried to collect together a variety of results that go beyond the standard 
Clauser-Horne-Shimony-Holt form of the Bell inequalities for four observ- 
ablcs. 

First, we state, and sketch the proof, of the fundamental theorem of the 
collection we consider: there is a factoring hidden variable for a finite set 
of finite or continuous observables, i.e., random variables in the language 
of probability theory, if and only if the observables have a joint probability 
distribution. The physically important aspect of this theorem is that under 
very general conditions the existence of a hidden variable can be reduced 
completely to the relationship between the observables alone, namely, the 
problem of determining whether or not they have a joint probability distri- 
bution compatible with the given data, e.g., means, variances and correlations 
of the observables. 

We emphasize that although most of the literature is restricted to no more 
than second-order moments such as covariances and correlations, there is no 
necessity to make such a restriction. It is in fact violated in the fourth-order 
moment that arises in the well-known Greenberger, Home and Zeilinger || 
three- and four- particle configurations providing new Gedanken experiments 
on hidden variables. For our probabilistic proof of an abstract GHZ result, 
see Theorem 9. 

As is familiar, Bell's results on hidden variables were mostly restricted to 
±1 observables, such as spin or polarization. But there is nothing essential 
about this restriction. Our general results cover any finite or continuous ob- 
servables (Theorem 1). We also state a useful theorem (Theorem 7) on func- 
tions of random variables, and give a partial corollary (Theorem 8) showing 
how such general probabilistic results are implicit in the reduction of higher 
spin cases to two-valued random variables in the physics literature. At the 
end we give various results on hidden variables for Gaussian observables and 
formulate as the final theorem a nonlinear inequality that is necessary and 
sufficient for three Gaussian random variables to have a joint distribution 
compatible with their given means, variances and correlations. 
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Factorization. In the literature on hidden variables, what we call the prin- 
ciple of factorization is sometimes baptized as a principle of locality. The 
terminology is not really critical, but the meaning is. We have in mind a 
quite general principle for random variables, continuous or discrete, which 
is the following. Let X 1; . . . , X n be random variables, then a necessary and 
sufficient condition that there is a random variable A, which is intended to be 
the hidden variable, such that Xi . . . , X n are conditionally independent given 
A, is that there exists a joint probability distribution of Xi, . . . , X n , without 
consideration of A. This is our first theorem, which is the general fundamen- 
tal theorem relating hidden variables and joint probability distributions of 
observable random variables. 

Theorem 1 (Suppes & Zanotti Holland & Rosenbaum, j^f) Let n ran- 
dom variables X±, . . . , X n , finite or continous, be given. Then there exists 
a hidden variable A such that there is a joint probability distribution F of 
(X 1; . . . , X n , A) with the properties 

(i) F(x u ...,£„ | A) = P(X X < x u . . . , X^ < x n | A = A) 

(ii) Conditional independence holds, i.e., for all X\, . . . , x n , \, 

n 

F(xx, . . . ,x n | A) = JJ FjixjlX), 
3=1 

if and only if there is a joint probability distribution o/Xi, . . . ,X n . Moreover, 
A may be constructed so as to be deterministic, i.e., the conditional variance 
given A of each Xj is zero. 

To be completely explicit in the notation 

F 3 (x J \X) = P(^<x J \\ = X). (1) 

Idea of the proof. Consider three ±1 random variables X, Y and Z. There 
are 8 possible joint outcomes (±1,±1,±1). Let p^ be the probability of 
outcome (i,j,k). Assign this probability to the value Ay^ of the hidden 
variable A we construct. Then the probability of the quadruple (i, j, k, A^-fc) 
is just pijk and the conditional probabilities are deterministic, i.e., 

P(X = i,Y = J,Z = fc|Ayfc) = l, 
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and factorization is immediate, i.e., 

P(X = i, Y = j, Z = k | Ay*) = P(X = i | X ljk )P(Y = j | X ijk )P(Z = k \ X ijk ). 

Extending this line of argument to the general case proves the joint prob- 
ability distribution of the observables is sufficient for existence of the factor- 
ing hidden variable. From the formulation of Theorem 1 necessity is obvious, 
since the joint distribution of (Xi, . . . ,X n ) is a marginal distribution of the 
larger distribution (Xi . . . , X n , A). 

It is obvious that the construction of A is purely mathematical. It has 
in itself no physical content. In fact, the proof itself is very simple. All the 
real mathematical difficulties are to be found in giving workable criteria for 
observables to have a joint probability distribution. As we remark in more 
detail later, we still do not have good criteria in the form of inequalities for 
necessary and possibly sufficient conditions for a joint distribution of three 
random variables with n > 2 finite values, as in higher spin cases. 

When additional physical assumptions are imposed on the hidden vari- 
able A, then the physical content of A goes beyond the joint distribution 
of the observables. A simple example is embodied in the following theorem 
about two hidden variables. We impose an additional condition of symme- 
try on the conditional expectations, and then a hidden variable exists only 
if the correlation of the two observables is nonnegative, a strong additional 
restriction on the joint distribution. The proof of this theorem is found in 
the article cited with its statement. 

Theorem 2 (Suppes & Zanotti fijj/ ) Let X and Y be two-valued random 
variables, for definiteness, with possible values 1 and —1, and with positive 
variances, i.e., cr(X), cr(Y) > 0. In addition, let X and Y be exchangeable, 
i.e., 

P(X = 1, Y = -1) = P(X = -1, Y = 1). 

Then a necessary and sufficient condition that there exist a hidden variable 
A such that 

E(XY | A = A) = P(X | A = X)E(Y | A = A) 

and 

E(X | A = A) = E(Y | A = A) 

for every value X (except possibly on a set of measure zero) is that the corre- 
lation of X and Y be nonnegative. 
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The informal statement of Theorems 1 and 2, which we call the Factorization 
Theorems, is that the necessary and sufficient condition for the existence of 
a factorizing hidden variable A is just the existence of a joint probability 
distribution of the given random variables Xj. 

Often, in physics, as in the present paper, we are interested only in the 
means, variances and covariances - what is called the second-order proba- 
bility theory, because we consider only second-order moments. We say that 
a hidden variable A satisfies the Second- Order Factorization Condition with 
respect to the random variables Xi, . . . , X n whose two first moments exist if 
and only if 

(a) E(X 1 ---X n \X) = E(X 1 \X)---E(X n \X), 

(b) E(Xj...Xl\X)=E(Xl\X)...E(Xl\X). 

We then have as an immediate consequence of Theorem 1 the following. 

Theorem 3 Let n random variables discrete or continuous be given. If there 
is a joint probability distribution ofXi, . . . ,X n , then there is a determinis- 
tic hidden variable A such that A satisfies the Second- Order Factorization 
Condition with respect to Xi, . . . , X n . 

Locality. The next systematic concept we want to discuss is locality. We 
mean by locality what we think John Bell meant by locality in the following 
quotation from his well-known 1966 paper 0. 

It is the requirement of locality, or more precisely that the re- 
sult of a measurement on one system be unaffected by operations 
on a distant system with which it has interacted in the past, that 
creates the essential difficulty. ... The vital assumption is that 
the result B for particle 2 does not depend on the setting a, of 
the magnet for particle 1, nor A on b. 

Although Theorems 1 and 2 are stated at an abstract level without any 
reference to space-time or other physical considerations, there is an implicit 
hypothesis of locality in their statements. To make the locality hypothesis 
explicit, we need to use additional concepts. For each random variable Xj, 
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we introduce a random vector Mj of parameters for the local apparatus (in 
space-time) used to measure the values of random variable Xj. 

Definition 1 (Locality Condition I) 

E(Xl\M h M„ A) = E(Xl\M h A), 

where k = 1,2, corresponding to the first two moments of Xj, i ^ j, and 
1 < i,j < n. 

Note that we consider only Mj on the supposition that in a given experimen- 
tal run, only the correlation of Xj with Xj is being studied. Extension to 
more variables, as considered in Theorem 7, is obvious. In many experiments 
the direction of the measuring apparatus is the most important parameter 
that is a component of Mj. 

Definition 2 (Locality Condition II: Noncontexuality) The distribution 
of X is independent of the parameter values Mj and Mj, i.e., for all functions 
g for which the expectation E(g(\)) and E(g(X)\lSAi, Mj) are finite, 

E{g(X)) = E{g(\)\M il M j ). 

Here we follow [IT]. In terms of Theorem 3, locality in the sense of Condition 



I is required to satisfy the hypothesis of a fixed mean and variance for each Xj. 
If experimental observation of Xj when coupled with Xj was different from 
what was observed when coupled with Xj/, then the hypothesis of constant 
means and variances would be violated. The restriction of Locality Condition 

II must be satisfied in the construction of A and it is easy to check that it is. 
This is often called, as indicated, Noncontexuality. 

We embody these remarks in Theorem 4. 

Theorem 4 Let n random variables Xi, . . . ,X ra be given satisfying the hy- 
pothesis of Theorem 2. Let Mi be the vector of local parameters for measuring 
X^ and let each Xj satisfy Locality Condition I. Then there is a hidden vari- 
able A satisfying Locality Condition II and the Second-Order Factorization 
Condition if there is a joint probability distribution o/Xi, . . . , X n . 
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Inequalities for three random variables. The next theorem states two 
conditions equivalent to an inequality condition given in [13] for three random 
variables having just two values. 



Theorem 5 Let three random variables X, Y and Z be given with values 
±1 satisfying the symmetry condition E(X) = E(Y) = E(Z) = and with 
covariances E(KY),E(YZ) and £7(XZ) given. Then the following three 
conditions are equivalent. 

(i) There is a hidden variable A with respect to X ; Y, and Z satisfying 

Locality Condition II and the Second- Order Factorization Condition 
holds. 

(ii) There is a joint probability distribution of the random variables X, Y ; 

and Z compatible with the given means and expectations. 

(hi) The random variables X ; Y and Z satisfy the following inequalities. 



-1 < E(XY) + E(YZ) + E(XZ) < l + 2Min(£(XY), E(YZ), E(XZ)). 



There are several remarks to be made about this theorem, especially the 
inequalities given in (iii). For discussion we introduce the standard correla- 
tion, and its standard notation, for two random variables X and Y whose 
variances are not zero: 

E(XY) - E(X)E(Y) 



P(X, Y) 



o XW Y 



where c(X), o~(Y) are the standard deviations of X and Y, i.e., the square 
roots of the variances: 



a{X) = V^ 2 (X) 

and 

a 2 (X) = Var(X) = £(X 2 ) - £(X) 2 . 

First, the explicit correlation notation p(X, Y) is not standard in physics, 
but is necessary here for comparing various theorems. The notation adopted 
throughout this article conforms fairly closely to what is standard in math- 
ematical statistics. 
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Physicists use less general notation, because they often assume certain 
symmetry conditions are satisfied, e.g., E(X) = E(Y) = E(Z) = 0. To make 
these relations explicit, keeping in mind the earlier definition of p(X, Y), we 
have: 

(i) Covariance of X and Y = Cov(X, Y) = E(XY) - E(X)E(Y), 

(ii) If E(X) = E(Y) = 0, then, clearly, 

Cov(X,Y) =E(XY) 

(iii) If X and Y are random variables whose only values are ±1 and -E'(X) = 
E(Y) = 0, then 

Var(X) = Var(Y) = 1, 

(iv) If hypothesis of (iii) is satisfied 

p(X,Y) = £(XY), 

which is why in the physics literature E(XY), with or without a comma 
between X and Y, so commonly occurs. The statistical terminology for 
E(XY) is bivariate product moment pn, which we shall often simply call 
the bivariate product moment, without further notation. 

Note that with the special symmetry conditions that E(X.) = E(Y) = 
E(Z) = 0, the inequalities (iii) of Theorem 5 for ±1 random variables can be 
written 

- 1 < p(X, Y) + p(Y, Z) + p(X, Z) < 1 + 2Min(p(X, Y), p(X, Z), p(Y, Z)). 

(2) 

Three Counterexamples. To show how special (iii) of Theorem 5, or the 
equivalent (2) written in terms of correlation, is, because of the strong sym- 
metry assumptions, we now give three different examples that do not satisfy 
these inequalities. The first is for ±1 random variables that do not have 
expectations equal to zero. For this case neither the correlations nor covari- 
ances have linear inequalities, only the moments E'(XY). The second case is 
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for random variables with values —1, 0, 1 and zero expectations. An example 
is given which is satisfied by the covariances but not the correlations. The 
third case is for random variables with values —2, 0, 2 and zero expectations. 
The inequalities of (iii) are not satisfied by the covariances, which in this case 
are equal to the expectations E(XY). 

First, for the general case of ±1 random variables we have 

E(X) = x , E(Y) = y , E(Z) = z 

and 

-1 < x , y ,z < 1, 

and it is straightforward to derive the analogue of (iii) of Theorem 5 for the 
bivariate product moments, as well as the corresponding correlations, but 
the expressions are more complicated for the correlations. We only give part 
of the details here. We generalize on the derivation given in ||13|| . We need to 
consider in detail the eight probabilities pijk for i, j, k = ±1. When referring 
to the marginals we use a dot for the missing random variable. For example, 

p U - = P(X=1,Y=1) 
Po-i = P(X = -1,Z = 1) 

(For ease of typography we use rather than —1 as a subscript.) 
We note immediately the following equations: 

E(XY) = pu- ~ Pio- - Poi- + Poo- 



1 - ECXY) 
Pio- + Poi- = 2 

and correspondingly, 

1 - E(YZ) 
P-io+P-oi = ^ 

1 - E(XZ) 
Pio + Poi = ^ 



x + 1 

Pi- = Pio- + Pn- = — o — 
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l/o + l 

Pi- = Pn- +P01. = — - — 



z + 1 
P-i = P-ii +P-oi = 2 

From these equations we easily derive 

x -y l-E(XY) 

PlO- = A 1 A 



1 x + J/o + £(XY) 
Pll- = T + 



4 4 

and similar expressions for p ± . , pi.i, etc. Using these equations, we may then 
derive 

1 , x + y + E(XY) 
Plw = 4 + 4 PlU 



1 a; + ^o + £(XZ) 
Pioi = 4 H ^ Pin 



1 |H + £(YZ) 
P011 = 4 + 4 Pm 



Pioo = Pin - 



Poio = Pill 



Pooi = Pin - 



y + z E(XY) E(XZ) 



4 


4 


4 


X + Zq 


E(XY) 


E(YZ) 


4 


4 


4 


x + yo 


E(XZ) 


£(YZ) 
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1 x + z , £(XZ) y + z E{YZ) x + y , E(XY) 
Pwo = I -^— + — ~ + ~ ~ + ~ Plu 

so 

1 + E(XY) + E(YZ) + E(XZ) - 2(x + y + z ) > 4p m 

And as a generalization of the left-hand inequality of (iii) of Theorem 5, we 
then have 

E(XY) + E(YZ) + E(XZ) - 2(x + y + z ) > -1. (3) 

This result is much simpler than the corresponding one for correlation. We 
have at once 

/ Y v\ - ^( XY ) - ^oyo 

and so 



£(XY) = ^1 - x^l - y *p(X, Y) + x y . 

Substituting the right-hand side for £(XY), and the corresponding expres- 
sions for E(YZ) and E(X.Z) yields a rather complicated inequality in terms 
of correlation, which we shall not write out here. 

The next remark is that (iii) is not necessary for the correlations of three- 
valued random variables with expectations equal to zero. Let the three values 
be 1, 0, —1. Here is a counterexample where each of the three correlations is 
— |, and thus with a sum equal to — |, violating (2). 

There is a joint probability distribution with the following values. Let 
p(x, y, z) be the probability of a given triple of values, e.g., (1, —1, 0). Then, 
of course, we must have for all x, y and z 

p(x, y,z)>0 and ^ p(x, y, z) = 1, 

x,y,z 

where x, y and z each have the three values 1, 0, —1. So, let 

p(-l,0,l)=p(l,-l,0)=p(0,l,-l)=p(l,0,-l)=p(-l,l,0)=p(0,-l,l) = ^ 

o 

and the other 21 p(x, y, z) = 0. Then it is easy to show that in this model 
E(X) = E(Y) = E(Z) = 0, Var(X) = Var(Y) = Var(Z) = |, and Cov(XY) = 
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Cov(YZ) = Cov(XZ) = — |, so that the correlations are 

p(X,Y)=p(Y,Z) = p(X,Z) = -i. 

Note that in the example just given the covariances for the three- valued 
random variables, with the joint distribution as stated, do satisfy (iii) of 
Theorem 5. 

For the third promised case, it is easy to construct a counterexample for 
covariances of three- valued random variables with values -2, 0, 2 and expec- 
tations zero. We use the same distribution for these new values: p(— 2, 0, 2) = 
p(2, -2, 0) = p(0, 2, -2) = p(2, 0, -2) = p(-2, 2, 0) = p(0, -2, 2) = f . It is 
easy to see at once that 

Cov(X, Y) = Cov(Y, Z) = Cov(X, Z) = --, 

3 

and so (iii) of Theorem 5 is not satisfied by these covariances. 

It is a somewhat depressing mathematical fact that even for three random 
variables with n-values and expectations equal to zero, a separate investiga- 
tion seems to be needed for each n to find necessary and sufficient conditions 
to have a joint probability distribution compatible with given means, vari- 
ances and covariances or correlations. A more general recursive result would 
be highly desirable, but seems not to be known. Such results are pertinent 
to the study of multi-valued spin phenomena, the discussion of which we 
continue after the next theorem. 

Bell's original inequality. We now return to Theorem 5 for another look 
at the inequalities (iii), which assume E(X.) = E(Y) = E(Z) = 0. How do 
these inequalities relate to Bell's well-known inequality written in terms 
of the bivariate product moments, 

1 + E(YZ) >\ E(XY) - E(XZ) |? (4) 

Bell's inequality is in fact neither necessary nor sufficient for the existence 
of a joint probability distribution of the random variables X, Y and Z with 
values ±1 and expectations equal to zero. That it is not sufficient is easily 
seen from letting all three covariances equal — |. Then the inequality is 
satisfied, for 
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±>o, 

but, as is clear from (iii) there can be no joint distribution with the three 
covariances equal to — |, for 

1 1 1 1 

1 1 < -1. 

2 2 2 

Secondly, Bell's inequality is not necessary. Let £7(XY) = ~, J5(XZ) = 
— ~, and E(YZi) = — ~, then (4) is violated, because 

but (iii) is satisfied, and so there is a joint distribution: 

111 ,,.,111, 

_1< < 1 + 2Min(-, — , — ), 

- 2 2 2 ~ v 2 2 T 

i.e., 

_!<__< o. 

~ 2 ~ 

Bell derived his inequality for certain cases satisfied by a local hidden- 
variable theory, but violated by the quantum mechanical covariance equal to 
— cos9ij. In particular, let 6*xy = 30°, #xz = 60°, #yz = 30°, so, geometri- 
cally Y bisects X and Z. Then 

'212;' 2 



Bell's Inequalities in the CHSH form. The next theorem states two 
conditions equivalent to Bell's Inequalities for random variables with just two 
values. This form is due to Clauser et al., ||. The equivalence of (ii) and 
(iii) was proved by Fine [|]]. 

Theorem 6 (Bell's Inequalities) Let n random variables be given satis- 
fying the locality hypothesis of Theorem 4- Let n = 4, the number of random 
variables, let each Xj be discrete with values ±1, let the symmetry condition 
EpQ = 0, % = 1, . . . , 4 be satisfied, let Xi = A, X 2 = A', X 3 = B ; X 4 = B', 
with the covariances E(AB), E(AB'), E(A'B) and E(A'B') given. Then 
the following three conditions are equivalent. 
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(i) There is a hidden variable A satisfying Locality Condition II and equation 

(a) of the Second- Order Factorization Condition holds. 

(ii) There is a joint probability distribution of the random variables A, A' , 
B and B' compatible with the given means and covariances. 

(iii) The random variables A, A' , B and B' satisfy Bell's inequalities in the 



It is worth emphasizing that in contrast to Bell's original inequality (4), the 
CHSH inequalities with four random variables give necessary and sufficient 
conditions for the existence of a joint probability distribution. 

It will now be shown that the CHSH inequalities remain valid for three- 
valued random variables, (spin-1 particles). Consider a spin-1 particle with 
the 3 state observables, A(a, A) = +1, 0, —1, B{b, A) = +1, 0, — 1. A is a hid- 
den variable having a normalized probability density, p(A). The expectation 
of these observables is defined as, 



We have suppressed the variable dependence on A and B for clarity. (Note 
that in this discussion we follow the notation of physicists, especially as used 
by Bell, rather than the standard notation of mathematical statistics for 
expectations, including covariances.) Consider the following difference, 



Since the density p > and \A\ = 1,0 we have the following inequality, 



CHSH form 



-2 < E(AB) + E(AB') + E(A'B) - E(A'B') < 2 

-2 < E(AB) + E(AB') - E(A'B) + E(A'B') < 2 
-2 < E(AB) - E(AB') + E(A'B) + E(A'B') < 2 
2 < -E(AB) + E(AB') + E(A'B) + E(A'B') < 2 
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Similarly we have the following inequality, 

\E{a',b) + E(a',b')\ = \ J A'[B + B']p(X)dX\, 

< J \[B + B']\p(X)d\. 

Adding the two expressions we arrive at the following inequality, 

\E(a,b)-E(a,b')\ + \E(a',b)+E(a',b')\ = J[\B - B'\ + \B + B'\]p{X)d\. 



The term in square brackets is equal to 2 in all cases except when B and B' 
are both equal to zero, in which case the right-hand side vanishes. With this 
and the normalization condition for the hidden variable density we have the 
same inequality as the spin-| CHSH inequality, 

\E(a,b) -E(a,b')\ + \E(a',b) + E(a',b')\ < 2. 

Note that we could create a stronger inequality by adding the function 
2(\E(a,b)\ - l)(\E(a,b')\ - 1) to the left-hand side. 

Higher Spin Cases. For higher spins we can proceed analogously and 
derive the following inequality which must be satisfied for spin j particles, 

\E(a,b) -E(a,b')\ + \E(a',b) + E(a',b')\ < 2j. 

If we define normalized observables, A (° ,A ) the original CHSH inequality will 
need to be satisfied for local hidden variable theories, although stronger in- 
equalities could be constructed. 

In Peres' work on higher spin particles the observable is defined by a 
mapping from the, 2j + 1-state, J z operator to a two-state operator [[K] . 
Under this mapping it was shown that Bell's inequality is violated for certain 
parameter settings of the detectors. 

The mapping from many values to ±1, as used by Peres and others is 
justified probabilistically by the following theorem, which provides a way 
of avoiding deriving separate inequalities for each of the higher spin cases 
(n > 2). 
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Theorem 7 Let Xi, ...,X n be n random variables with joint probability 
distribution F(x±, . . . ,x n ). Let fi, . . . , fk be finite-valued measurable func- 
tions of the random variables X 1; . . . , X n; with yi — fi(x±, . . . , x n ), . . . ,y k = 
fk{xi, . . . ,x n ). Then there is a function G(y±, . . . ,yk), unique up to sets of 
measure zero, that determines the joint probability distribution of the random 
variables Y 1 , . . . , Y k that are functions o/Xi, . . . , X n . 

Idea of the proof. We only sketch the proof for a simple finite case to avoid 
technical details, for the underlying idea is very intuitive. 

Let X, Y and Z be ±1 random variables with a joint distribution. Let A 
and B be random variables that are functions of X, Y and Z. In particular, 
let 

A= /(X,Y)= X + Y 
B= /(Y,Z)= Y + Z. 

Then it is easy to see that range of values of A and B is {—2,0,2}. More 
importantly, the joint distribution of A and B is easily computed from the 
joint distribution of X, Y and Z. Of the nine possible triples of values for 
the joint distribution, we show four, the remaining five are very similar: 

P(A = -2&B = -2) = P(X = -1&Y = -1&Z = -1) 
P(A = -2&B = 0) = P(X=-1&Y = -1&Z = 1) 
P(A = -2&B = 2) = 
P(A = 0&B = 0) = P((X= -l&Y = l&Z = -1) or 

(X = 1&Y = -1&Z = 1)) 

The following partial converse of Theorem 7 is really what is implicit in 
the reduction of higher spin cases to just two values, rather than Theorem 7 
itself. For simplicity of formulation we restrict the statement of the theorem 
to four random variables, using the familiar notation of Theorem 6, and 
also restrict the functions to functions of a single random variable, with the 
additional constraint that the functions have only the values ±1. 

Theorem 8 Let A,B,A',B' be random variables with means, variances 
and covariances given, but with no assumption of a joint distribution. Let 
/a, /b, /a'j /b' ^ e finite-valued measurable functions of the respective ran- 
dom variables and let the functions have only the values ±1. If there is no 
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joint distribution of /a (A), /b(B), f A > (A') and / B '(B') compatible with the 
means, variances and covariances of the functional random variables, then 
there is no joint distribution of A, B, A , B compatible with the given means, 
variances and covariances. 

GHZ Probabilistic Theorem. Changing the focus, we now consider an 
abstract version of the GHZ gedanken experiment. All arguments known 
to us, in particular GHZ's || own argument, the more extended one in 
and Mermin's || proceed by assuming the existence of a deterministic hid- 
den variable and then deriving a contradiction. It follows immediately from 
Theorem 1 that the nonexistence of a hidden variable is equivalent to the 
nonexistence of a joint probability distribution for the given observable ran- 
dom variables. The next theorem states this purely probabilistic GHZ result, 
and, more importantly, the proof is purely in terms of the observables, with 
no consideration of possible hidden variables. 

Theorem 9 (Abstract GHZ version) . Let A 9l , B^ 2 , C V3 , D V4 be an in- 
finite family of ±1 random variables, with ipi a periodic angle or phase, 
< ipi < 27r ; and let the following condition hold: 

E(A lfil B ip2 C lp3 'D ip4i ) = - cos(y3i + ip 2 - ip 3 - <Pa) (5) 

Then the finite subset of random variables Ao, Bo, Co, Do, A n Az., Ci, Di 
does not have a joint probability distribution. 

Proof. We note first, as an immediate consequence of @, 

(i) if if i + ip 2 - y?3 - y?4 = then E(A tpi B ip2 C ip3 D (p4 ) = -1, 

(ii) if ip\ + ip 2 - ^3 - <^4 = 7T then £ , (A V , 1 B ¥ , 2 C ¥ , 3 D ¥ , 4 ) = 1. 

The proof proceeds by deriving a contradiction from the supposition of the 
existence of a joint probability distribution. Because conditional probabil- 
ities are used repeatedly, we must check the given condition in each such 
probability has positive probability. Let Sj, i = 1, . . . , 4 be +1 or —1. One of 
the 16 products of the four signs must have positive probability, in the sense 
that 

P(A = si,B = s 2 ,Co = s 3 , D = s 4 ) >0 (6) 



17 



(We do not need to know whether each Si is +1 or — 1.) Then since the angles 
sum to 0, the product 

sis 2 s 3 s 4 = -1. (7) 
We also can infer at once from (||) and (ii) 

P(A 7r = S2S3S4 I B = s 2 , C = s 3 , D = Si) = 1, (8) 

since @ ensures that the condition in @ has positive probability. Using (i) 
now, by a similar argument 

P(A C = -s 2 s 4 I B = s 2 , D = s 4 ) = 1, (9) 

and from (|J) and familiar facts about probability-1 propositions (see Lemma 
1 of the Appendix), we may add Co = S3 to the condition @ to obtain 

P(A C = -s 2 s 4 I B = 8 2 , C Do = s 3 s 4 ) = 1. (10) 
Using (i) and (|j) again 

P(A f C f = -s 2 s 4 I B = s 2 , C D = s 3 s 4 ) = 1 (11) 
And so, using Lemma 2 of the Appendix and flTOD and (|TTD, we infer 

P(A C = A f C f I B = s 2 , C D = s 3 s 4 ) = 1. (12) 
By an argument just like that of (|9]) - (|I2"D , we also infer 

P(A D = A f D f I B = s 2 , C D = s 3 s 4 ) = 1 (13) 
Dividing the equation of flTJD by that of (|T3|), we get 

P ^ = ^ I B = s 2 , C D = s 3 s 4 ) = 1, (14) 
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and since the random variables have only values +1 and —1, we may rewrite 
(0) as: 

P(C D = C f D f I B = s 2 , C D = s 3 s 4 ) = 1 (15) 
From (JXSj) and Lemma 3 of the Appendix we get 

P(C f D ? = s 3 s 4 I B = s 2 , C D = s 3 s 4 ) = 1, (16) 



and so immediately we may infer from ([5]) and (TO) 



P(B = s 2 ,C f D f = s 3 s,) >0, (17) 
Then from (i) and ( |TTD 

P(A 7r = -S2S3S4 I B = s 2 , C^D| = S3S4) = 1, (18) 
and finally from (|Tj|) and ( JT8| ) and Lemma 5 of the Appendix 

P(A 7r = -S2S3S4 I B = s 2 , C D = S3S4) = 1. (19) 



Obviously, (H) and (19) together yield the desired contradiction. 



Gaussian random variables. A fundamental second-order theorem about 
finite sequences of continuous random variables is the following: 

Theorem 10 Let n continous random variables be given, let their means, 
variances and covariances all exist and be finite, with all the variances nonzero. 
Then a necessary and sufficient condition that a joint Gaussian probability 
distribution of the n random variables exists, compatible with the given means, 
variances and covariances, is that the eigenvalues of the correlation matrix 
be nonnegative. 

A thorough discussion and proof of this theorem can be found in Loeve M. 
It is important to note that the hypothesis of this theorem is that each pair 
of the random variables has enough postulated for there to exist a unique bi- 
variate Gaussian distribution with the given pair of means and variances and 
the covariance of the pair. Moreover, if, as required for a joint distribution of 
all n variables, the eigenvalues of the correlation matrix are all nonnegative, 
then there is a unique Gaussian joint distribution of the n random variables. 

We formulate the next theorem to include cases like Bell's inequalities 
when not all the correlations or covariances are given. 

Theorem 11 Let n continuous random variables be given such that they 
satisfy the locality hypothesis of Theorem 4, let their means and variances 
exist and be finite, with all the variances nonzero, and let m < n{n — l)/2 
covariances be given and be finite. Then the following two conditions are 
equivalent. 



19 



(i) There is a joint Gaussian probability distribution of the n random vari- 

ables compatible with the given means, variances and covariances. 

(ii) Given the m < n(n — l)/2 covariances, there are real numbers that may 

be assigned to the missing correlations so that the completed correlation 
matrix has eigenvalues that are all nonnegative. 

Moreover, (i) or (ii) implies that there is a hidden variable A satisfying 
Locality Condition II and the Second-Order Factorization Condition. 

The proof of Theorem 11 follows directly from Theorem 10. 

Using Theorem 10, we can also derive a nonlinear inequality necessary and 
sufficient for three Gaussian random variables to have a joint distribution. 
In the statement of the theorem p(X, Y) is the correlation of X and Y . 

Theorem 12 Let X, Y and Z be three Gaussian random variables whose 
means, variances and correlations are given, and whose variances are nonzero. 
Then there exists a joint Gaussian distribution o/X, Y and Z (necessarily 
unique) compatible with the given means, variances and correlations if and 
only if 

p(X, Y) 2 + p(X, Z) 2 + p(Y, Z) 2 < 2p(X, Y)p(Y, Z)p(X, Z) + 1. 

The proof comes directly from the determinant of the correlation matrix. 
For a matrix to be non-negative definite the determinant of the entire matrix 
and all principal minors must be greater than or equal to zero, 



1 p(X,Y) p(X,Z) 
Det | p(X,Y) 1 p(Y,Z) | > 0. 
p(X,Z) p(Y,Z) 1 



(20) 



Including the conditions for the minors we have, 

p(X,Y) 2 + p(X,Z) 2 + p(Y,Z) 2 -2p(X,Y)p(X,Z)p(Y,Z) < 1 

p(X,Y) 2 < 1 

p(Y,Z) 2 < 1 

p(X,Z) 2 < 1.(21) 

The last three inequalities are automatically satisfied since the correlations 
are bounded by ±1. 
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Simultaneous observations and joint distributions. When observa- 
tions are simultaneous and the environment is stable and stationary, so that 
with repeated simultaneous observations satisfactory frequency data can be 
obtained, then there exists a joint distribution of all of the random variables 
representing the simultaneous observations. Note what we can then con- 
clude from the above: in all such cases there must be, therefore, a factorizing 
hidden variable because of the existence of the joint probability distribution. 
From this consideration alone, it follows that any of the quantum mechanical 
examples that violate Bell's inequalities or other criteria for hidden variables 
must be such that not all the observations in question can be made simul- 
taneously. The extension of this criterion of simultaneity to a satisfactory 
relativistic criterion is straightforward. 

1 Appendix 

We prove here several elementary lemmas about probability- 1 statements 
used in the proof of Theorem 9. 

Lemma 1 If P(A | B) = 1 and P(BC) > then P(A | BC) = 1. 

Proof. Suppose, by way of contradiction, that 

P(A | BC) < 1. (22) 

Now from (p2[) and the definition of conditional probability, we have at once 

P(ABC) < P{BC). (23) 

Adding P(ABC) to both sides of (|23|) and simplifying we have 

P{AB) < P{BC) + P(ABC). (24) 

We now take conditional probabilities with respect to B, and divide both 
sides of (fJU) by P{B), for by the hypothesis of the lemma, P(B) > 0, and 
thus we obtain 

P(A | B) < P(C | B) + P(AC | B), 

but 

P(C | B) +P(AC \B)<1 
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and by the hypothesis of the lemma 

P(A | B) = 1, 

whence we have derived the absurdity that 1 < 1. Thus the lemma is estab- 
lished. 

Lemma 2 Let X and Y be two random variables with a joint distribution, 
and let 

(i) P{A) > 0, 

(ii) P(X = c \ A) = 1, 

(iii) P(Y = c | A) = 1. 

T/ien 

P(X = Y | A) = 1. 

Proof. Let 

P = {cj : X(w) = c} 
C = {cj : Y(u) = c] 
D = {u : X(u) = Y(u)} 

Suppose by way of contradiction that 

P(D | A) < 1. 

Then 

P({lu : X(w) ^ Y(cj)} | A) > 

And so 

P{{u : ^ c or Y(u) ^ c} \ A) > 0. 

Without loss of generality let 

P({lo : X(u) ^c}\ A)>0. 

Then 

P(P | A) > 0, 

and this contradicts (ii). 

We also need a sort of converse of Lemma 2. 
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Lemma 3 IfP(AkX = c) > and P(X = Y | AkX = c) = 1 then 

P(Y = c | AkX = c) = 1. 

Proof. By hypothesis 

P(X = Y&A&X = c) = P(A&X = c). 
Consider now the left-hand side: 

{lu : X(lu) = Y(lu)}&{lu : X(uj) = c} = {to : X(u) = ck Y{uj) = c} 

= {uo : X(u) =c}n{u : Y(u) = c}, 

and so 

P(X = Y&A&X = c) = P(Y = cLALX = c), 

and thus, 

P(Y = cLALX = c) = P(A&X = c), 

whence 

P(Y = c| AkX = c) = l. 

We can also prove a kind of transitivity for conditional probabilities that 
are 1. 

Lemma 4 If P(B) > 0, P(C) > 0,P(A | P) = 1 and P(B | C) = I, then 
P{A | C) = 1. 

Proof. By hypothesis and Lemma 1 

P(A | PC) = 1, 



so 

but by hypothesis 

so 

and thus 
whence 



P{ABC) = P{BC) 
P(BC) = P(C), 

P(APC) = P(C), 
P(AP | C) = 1, 



P(A | C) = I. 
Finally, we also use the following, 
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Lemma 5 If P(AkY — d) > 0,P(AkZ = d) > 0, and 

{i)P{X = c\ ALY = d) = 1, 
(ii)P(Z = Y | AkZ = d) = 1, 

P(X = c \ A&Z = d) = 1. 
Proof. By Lemma 3 and (ii) 

P{ALY = d\ ALZ = d) = 1 
So, by transitivity (Lemma 4) & (i) 

P(X = c| A&Z = d) = l 
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